Stencil computations consume a major part of runtime in many scientificsimulation codes. As prototypes for this class of algorithms we consider theiterative Jacobi and Gauss-Seidel smoothers and aim at highly efficientparallel implementations for cache-based multicore architectures. Temporalcache blocking is a known advanced optimization technique, which can reduce thepressure on the memory bus significantly. We apply and refine this optimizationfor a recently presented temporal blocking strategy designed to explicitlyutilize multicore characteristics. Especially for the case of Gauss-Seidelsmoothers we show that simultaneous multi-threading (SMT) can yield substantialperformance improvements for our optimized algorithm.
展开▼